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ABSTRACT 

This  paper  describes  an  algorithm  for  implementing  a  nonlinear  p 
gramming  system  on  the  array  computer  ILLIAC  IV.   A  brief  review  of  existing 
algorithms  is  made  to  determine  the  feasibility  of  using  a  particular  method. 
The  choice  for  the  ILLIAC  IV  is  a  combination  of  the  sequential  unconstrained 
minimization  technique  and  the  projected  gradient  method.   The  advantages  of 
this  combined  method  are  discussed.   Problems  associated  with  implementation 
of  the  algorithm  on  the  machine  are  discussed,  and  an  overview  of  the  system 
is  presented. 
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NOTATION 

A  =  Approximation  to  Hessian  matrix 

C  ,  C  =  vector  components  of  the  gradient  of  the  objective  function 

d  =  step  length 

D  =  direction  vector 

E  =  symmetric  s  by  s  matrix 

E  =  Euclidean  r-dimensional  space 

f,  =  k-th  current  solution  of  the  objective  function  f(X.  ) 

g. (X)  =  i-th  constraint  equation 

G  =  hypersurface  of  the  intersection  of  s  active  constraints 

h(Y)  =  penalty  function 

H(X,  p)  =  revised  objective  function 

i  =  index  for  constraints 

In  -  H(X,  p)  =  infeasible  revised  objective  function 

j  =  index  for  activities 

J  =  set  of  equality  constraints 

k  =  current  solution  index 

K  =  set  of  inequalities  greater  than  zero 

I  =   length  of  feasible  region 

L  =  Hessian  matrix 

L  =  set  of  inequalities  less  than  zero 

m  =  total  number  of  inequalities 

n  =  total  number  of  activities 

N  =  unit  vector  which  points  in  direction  of  the  projected  gradient 

p  =  penalty  constant 

P  =  projected  matrix 

P  V  f  =  projected  gradient 


vilj 


q  =  slack  variables  in  exterior  approach 

R  =  test  vector  in  projected  gradient  method 

r  =  total  number  of  constraints 

s  =  total  number  of  active  constraints 

U  =  Matrix  used  in  Gradient  Projection  Algorithm 

V  =  symmetric  s  by  s  matrix 
X  =  current  solution  vector 
X*  =  local  optimum 

V  =  correction  into  feasible  region 

a   =  test  for  projected  gradient  method 
[3.,  =  i-th  position  of  P,  vector 

7  =  test  for  rate  of  convergence 

5  =  measure  of  closeness  to  a  hypersurface 

e  =  step  length  in  conjugate  direction  method 

V  =  gradient  -  a  column  vector  of  first  partial  derivatives 
v  =  matrix  of  second  partial  derivatives 

A  =   forward  difference  operator 
=  Euclidean  norm 
=  transpose  of  a  matrix 


1 .      INTRODUCTION 

1.1  The  Object  and  Scope  of  the  Study 

The  purpose  of  this  investigation  is  to  establish  a  nonlinear  pro- 
gramming system  for  the  parallel  array  computer  ILLIAC  IV.   The  nonlinear 
system  is  to  supplement  the  linear  system  now  being  developed.   Its  capabil- 
ities range  from  small  decidedly  nonlinear  problems  to  large  quasi-linear 
problems.   The  bound  of  the  larger  problems  is  of  the  order  of  ten  thousand 
constraints,  while  the  smaller  highly  nonlinear  problems  rarely  exceed  one 
hundred  constraints.   This  wide  variance  makes  it  difficult  to  apply  one  of 
the  existing  algorithms. 

This  report  presumes  an  understanding  of  the  concepts  of  ILLIAC  IV 
and  mathematical  programming.   The  uninformed  reader  should  refer  to  the  list 
of  references  for  background  material;  however,  the  author  recommends  a  prior 
knowledge  of  linear  programming. 

At  the  present,  nonlinear  programming  is  a  hodgepodge  of  specific 
examples.   A  unified  theory  has  not  yet  been  devised  and  appears  unlikely  to  be 
in  the  near  future.   Particular  methods  have  arisen  which  solve  very  special 
problems.   The  system  designer  must  account  for  this  diversity  and  plan  accord- 
ingly.  The  path  is  not  an  easy  one  and  many  questions  must  be  answered.  What 
is  the  range  of  problems  that  the  system  will  solve?  Is  the  local  optimum 
satisfactory?  How  stable  is  the  stationary  solution?  These  types  of  questions 
are  examined  in  this  report. 

The  scope  of  the  system  is  not  limited  to  convex  programming.  An 
attempt  is  made  to  find  many  local  optima  so  that  the  probability  of  determining 
the  global  optimum  is  increased.   Since  no  one  has  discovered  a  technique  for 
finding  global  conditions  for  general  nonconvex  problems,  this  deficiency  must 
be  accepted.   Parallel  computing  handles  the  problem  nicely,  as  is  demonstrated 
in  a  subsequent  chapter. 


1.2  Organization  of  the  Report 

The  remainder  of  this  chapter  presents  the  nonlinear  programming 
problem  per  se.   A  review  of  the  existing  techniques  is  made  and  the  procedure 
chosen  is  the  best  with  regard  to  the  range  of  applications  and  the  structure 
of  ILLIAC  IV.   The  advantages  and  disadvantages  are  briefly  explained. 

Chapter  2  describes  the  Sequential  Unconstrained  Minimization  Tech- 
nique (SUMT)  in  detail.   Various  penalty  functions  are  examined  and  the  differ- 
ences weighed.   The  interior  and  exterior  theories  are  discussed,  as  well  as 
I  ■  blems  with  convergence.   Unconstrained  methods  are  looked  at  and  an  example 
is  presented. 

Chapter  3  involves  the  projected  gradient  method.   The  essential  for- 
mula is  proven,  and  a  sample  problem  is  shown.   Problems  with  step  length  and 
zigzagging  are  pointed  out  and  a  search  for  the  best  step  length  is  covered. 
The  Combined  Method  algorithm  is  discussed  in  Chapter  k. 

The  implementation  of  the  algorithm  on  ILLIAC  IV  is  discussed  in 
Chapter  5.   The  efficiency  of  machine  usage  is  observed  with  an  eye  to  the  rate 
of  convergence.   The  details  of  parallel  computations  are  discussed  so  that  a 
realistic  code  can  be  programmed.   The  coding  requirements,  such  as  storage 
schemes,  are  presented  and  their  importance  in  an  array  environment  is  stressed. 

The  results  of  the  study  are  listed  in  the  summary.  Finally,  the  re- 
port ends  with  an  overview  of  the  system  and  a  recommendation  for  future  imple- 
mentation. 

1. 3  Statement  of  the  Problem 

The  general  mathematical  programming  problem  is  to  minimize  the  objec- 
tive function  in  n  variables 
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■  f(V  \  =  (*ik>  •  •  •  >  *„*> 


subject  to  the   constraints 


gi(xk)>bi  1-1,. 


m 


gi(Xk)    =  bi  i  =  nH-1,    .    .    .    ,   r 


b.    >  0 

1  — 


which  form  the  feasible  region  in  r-dimensional  Euclidean  space.   It  is  usually 
unnecessary  to  add  slack  or  artificial  variables  in  nonlinear  problems,  in 
contrast  to  the  simplex  algorithm  of  linear  programming.   The  equality  con- 
straints can  be  changed  into  inequalities  by  adding  the  following  constraints 

MV  -bi  >° 

\    -8l(\)>0 

but  often  they  should  be  dealt  with  separately.   In  this  report,  the  vector 
(X,  )  is  the  k   current  solution. 

The  objective  function  and  constraints  f  and  g.  (i  =  1,  .  .  .  ,  r) 
can  be  either  linear  or  nonlinear,  but  they  must  be  continuous  and  possess  first 
derivatives.   The  difficulty  in  determining  the  solution  is  directly  propor- 
tional to  the  amount  of  curvature  of  the  function.  Hence,  linear  programming  is 
solved  in  the  most  straightforward  fashion. 

The  first  step  into  the  nonlinear  realm  is  to  allow  the  objective 
function  to  be  nonlinear  while  holding  the  constraints  linear.   This  problem 
can  be  solved  by  the  simplex  method  and  is  an  extension  of  that  technique.   The 
linear  constraints  define  a  convex  region,  and  the  method  is  called  convex  pro- 
gramming. 

The  next  step  into  the  nonlinear  is  more  painful,  for  many  reasons. 
The  most  important  problem  is  the  global -local  dilemma.   Once  the  feasible 


region  is  no  longer  convex  the  optimum  solution  to  the  entire  problem  cannot  be 
determined  with  certainty.   A  stationary  point  solution  is  the  best  that  can  be 
achieved,  and  many  local  minima  are  found.   These  local  minima  are  compared  to 
find  the  smallest.   Starting  the  procedure  at  various  points  increases  the  prob- 
ability of  determining  the  global  minimum. 

1.*+  A  Review  of  the  Existing  Algorithms 

Two  methods  are  combined  in  the  design  of  the  system  for  ILLIAC  IV 
because  of  the  difficulty  in  justifying  a  stand-alone  procedure.   The  combina- 
tion is  helpful  in  overcoming  the  global-local  difficulty.   The  advantages  of 
both  the  methods  are  utilized  so  that  a  more  effective  process  results.   Before 
the  "Combined  Method"  (a  name  used  in  the  remainder  of  the  report)  is  analyzed, 
a  review  of  some  of  the  more  popular  algorithms  is  made. 

The  first  technique  to  be  examined  is  the  cutting  plane  method  [9], 
the  central  principle  of  which  is  to  linearize  the  constraints  at  predetermined 
points  and  solve  the  resulting  LP  problem.   The  linearization  is  achieved  by 
applying  first  order  Taylor  series  approximations 

g.(x)  =  g.(xk)  +  ^.(Xjp'cx  -  x^ 

at  the  solution  (X,  ).   If  the  solution  of  the  LP  problem  satisfies  the  original 
constraints,  the  method  is  finished;  otherwise  the  estimates  are  revised  and  the 
process  continues.   The  technique  converges  for  convex  problems,  but  fails  to 
find  a  local  optimum  in  nonconvex  problems.   The  path  leaves  the  feasible  region 
and  never  returns.   The  method  is  not  applicable  in  this  study  because  of  the 
limitation  of  the  range  of  applications  to  convex  problems.   The  author  discounts 
the  accelerated  cutting  plane  technique  for  similar  reasons. 

The  separable  programming  method  is  used  when  the  objective  function 
is  separable;  that  is,  of  the  form 


n 
f(X)  =  Z   f  6c.  ) 

and  the  constraints  are  linear  and  also  separable.   The  method  divides  the 
objective  function  into  a  grid  and  linearizes  it  at  the  intervals.   This  lin- 
earization is  applied  to  each  variable.   The  problem  is  to  minimize  f(X)  subject 
to 


g,(X)  =  Z     g.,(x.)  . 


.  -  -13* "J* 


Here  again  the  procedure  converges  if  convexity  is  retained,  but  fails  otherwise, 
This  method  is  likewise  rejected. 

The  Lagrangian  differential  gradient  method  has  a  different  form  than 
the  previous  methods.   It  utilizes  the  gradient  of  the  objective  function  which 
points  toward  the  greatest  increase  of  the  function.   The  negation  of  the  gradi- 
ent is  used  here.   Knowledge  of  the  constraints  alters  the  direction  according 
to  the  gradient.   Notationally 


dX/dy  =  Vf(X)  -  Z   u.Vg.(x) 
i=l  X  x 


where  u.  is  a  Lagrange  multiplier  and 


f  g, (X)     if  u  >  0  or  g.(X)  >  0 
du/dy  =  (     x 

0        otherwise 


The  solution  to  the  original  problem  is  obtained  from  the  solution  to  the 
differential  equations  by  letting  y  -*  ».   Strict  convexity  must  be  retained,  and 
the  solution  cannot  be  found  if  the  constraints  are  linear.  This  method  has 
been  a  stepping  stone  for  other  gradient  methods. 


The  next  method  to  be  examined  is  the  Sequential  Unconstrained  Mini- 
mization Technique.   It  is  discussed  lightly  because  it  is  one  of  the  methods 
in  the  combined  algorithm.   The  details  are  presented  in  Chapter  2.   The  method 
has  been  perfected  at  the  Research  Analysis  Corporation  by  McCormick  and  Fiacco 
[5]-   Convergence  is  proved  for  convex  regions,  and  will  also  closely  approach 
nonconvex  optima.   An  advantage  is  the  partial  avoidance  of  unwanted  local  minima. 
The  idea  is  to  incorporate  the  constraints  into  the  objective  function  and  thereby 
change  the  constrained  problem  into  a  series  of  unconstrained  minimizations.   A 
penalty  term  is  introduced  to  weigh  the  importance  of  the  constraints.   The  se- 
quence approaches  the  solution  of  the  original  problem  as  the  penalty  is  mono- 
tonically  decreased.   An  advantage  of  the  computational  technique  is  the  natural 
addition  of  an  unconstrained  subroutine,  thereby  greatly  increasing  the  range  of 
possible  applications. 

The  small  step  gradient  method  is  a  well  known  nonlinear  algorithm. 
It  is  similar  to  the  cutting  plane  scheme.   The  small  step  method  attempts  to 
remain  in  the  feasible  region  by  changing  the  solutions  gradually.   The  steps  of 
the  algorithm  are: 

1.  Linearize  all  constraints  by  a  Taylor  series  approximation  at  a 
current  feasible  solution. 

2.  Minimize  w  =  f  (X^)  =  vf  (X^  (X)  with  the  current  solution  =  X, 
subject  to   Vgi(Xk)'  (\)    =  Vgi(Xk)'(Xk)  -  f(Xk) 

and   ||  (X)  -  (X  )||  =  OL,   with  a  a  small  number. 

3.  Test  for  an  optimum  value  and  return  to  step  1  if  the  test  fails. 

Any  familiar  LP  method  is  used  to  solve  step  2  above.  Notice  that  no  informa- 
tion carries  from  iteration  to  iteration  because  of  the  relinearization  of  all 
constraints.   The  number  of  computations  is  increased.   This  method  converges 
very  slowly  and  is,  therefore,  rejected. 


7 

The  final  method  to  be  e;  amined  is  the  gradient  projection  method  of 
Rosen  [12].   It  is  a  subset  of  the  general  method  of  feasible  direction,  whi 
attempts  to  decrease  the  objective  function  at  every  step,  and  is  the  second 
portion  of  the  Combined  Method.   Therefore,  the  description  here  is  brief  a:: 
it  will  be  described  more  fully  below.   The  method  begins  with  an  initial  fea- 
sible interior  solution  and  follows  the  negative  gradient  to  the  boundary  at 
which  point  the  projection  is  determined.   The  projected  direction  is  not  nec- 
essarily the  best,  as  in  Zoutendijk's  method  [15]  which  solves  an  LP  problem  to 
determine  the  direction  of  greatest  change  within  the  region;  however,  a  decrease 
of  the  objective  function  is  guaranteed  and  an  LP  problem  is  not  formulated  at 
each  iteration.   The  method  converges  in  a  small  number  of  iterations,  especially 
with  linear  constraints.   Although  the  amount  of  work  per  step  is  large,  the 
method  is  acceptable  and  is  even  shown  to  be  desirable  for  the  ILLIAC  IV. 

1. 5  A  Description  of  the  Combined  Method 

As  mentioned  in  the  previous  section,  a  combination  of  two  existing 
methods  is  used  in  the  nonlinear  system  design.  The  sequential  unconstrained 
minimization  technique  is  used  for  the  small  highly  nonlinear  problems  and  as 
a  first  approximation  to  the  projected  gradient  method.  The  latter  method  is 
used  exclusively  with  large  quasi-linear  problems  and  in  speeding  convergence 
in  the  combined  mode. 

The  stages  of  the  Combined  Method  are  as  follows: 

1.   Phase  one  of  the  procedure  involves  the  determination  of  an  initial 
feasible  solution  if  one  does  not  already  exist.   It  is  often  the 
case  that  a  feasible  solution  is  known  a  priori  because  of  the 
nature  of  the  problem. 
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2.  The  unconstrained  revised  I'unction  is  determined  by  the  original 
constrained  problem.   The  function  depends  upon  the  current  solu- 
tion, and  an  arbitrary  penalty  function  and  constant. 

3.  The  unconstrained  function  is  minimized  for  a  particular  penalty 
constant.  An  attempt  is  made  to  find  all  local  minima  of  the 
converted  function.   Tests  are  performed  to  check  feasibility. 

k.      The  solution  of  the  unconstrained  problem  becomes  the  current 

solution,  and  a  test  for  the  rate  of  convergence  is  made.   If  the 
rate  is  adequate,  the  penalty  is  lowered  and  the  process  returns 
to  step  three.   The  test  for  convergence  is  part  of  the  locking 
test. 

5.  The  projected  gradient  method  is  called  into  the  program  if  the 
locking  test  indicates  that  it  would  be  appropriate.   The  method 
determines  a  feasible  direction  and  then  a  step  length  which 
minimizes  the  objective  function  violating  the  constraints. 

6.  The  locking  test  is  made  again.   The  test  includes  a  check  for 
local  optimum  as  well  as  the  rate  of  convergence.   The  optimum  is 
found  within  a  delta  region  because  of  the  nature  of  the  algorithm. 
It  is  possible  to  return  to  the  sequential  method  if  conditions  so 
warrant. 

7-   This  step  is  included  for  nonconvex  regions.   Once  a  local  minimum 
is  found  by  the  above  process,  the  entire  procedure  is  reiterated 
from  another  feasible  position.  Many  optima  are  found  in  this 
manner  and  the  probability  of  finding  the  global  is  improved. 

These  steps  give  a  rough  understanding  of  the  Combined  Method.   The  prime  advan- 
tage of  this  algorithm  is  the  wide  range  of  applications  which  attempt  to  find 
global  optimum  of  nonconvex  regions  in  a  few  iterations.   The  remainder  of  the 
report  expands  this  assertion. 


2.   THE  SEQUENTIAL  UNCONSTRAINED  MINIMIZATION  TECHNIQUE 

Formulation  of  the  Method 

The  essence  of  the  method  is  concerned  with  the  conversion  of  the 
objective  function  and  constraints  of  the  constrained  problem  into  an  uncon- 
strained problem  with  a  revised  objective  function,  which  is  minimized.   A 
sequence  of  conversions  and  minimizations  is  computed  which  converges  to  the 
solution  of  the  constrained  problem  as  the  number  of  iterations  approaches  in- 
finity.  The  procedure  terminates  at  a  delta  closeness  to  the  optimum  which  is 
defined  by 


S         2    2 
i=l   x 


for  the  s  constraints  which  lie  in  the  hypersurface  G.   In  the  unconstrained 
method,  a  test  of  convergence  is  made  to  measure  performance.   The  formula 

'  '  (fk  "  fk+iVfk  . 

where  f  =  f  (X,  ),  is  checked  against  a  constant  which  has  been  predetermined  by 
the  nature  of  the  problem.   A  local  optimum  test  is  made  when  y   becomes  less  than 
the  specified  constant.   The  procedure  halts  or  switches  to  the  projected  gradi- 
ent if  the  minimum  is  not  found. 

The  revised  objective  function  for  the  sequential  method  is  defined  by 


m 


with 


yik  =  e±(\) 
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The  penalty  constant  for  the  k-th  iteration  is  p  ,  and  the  penalty  function  is 

Ihus  the  revised  objective  function  is  also  a  function  of  the  constraints 
when  the  penalty  is  nonzero.   The  function  depends  upon  the  curvature  of  the 
constraints,  the  number  of  local  minima,  and  the  ease  of  calculating  the  con- 
straints.  It  has  the  property  that 

m 
lim  L   Pkh(ylk)  =  0 
k  -* »  i  =1 

and  the  result 

lim  H(Xk,pR)  =  f#  , 
k-*« 

*      •*- 
where  f#  =  f (X  )  and  X  is  the  solution  to  the  constrained  problem. 

The  effect  of  the  conversion  is  the  simulation  of  the  original  prob- 
lem more  and  more  closely  as  the  sequence  progresses.   The  contribution  of  the 
constraints  to  the  revised  objective  function  diminishes  and  vanishes  at  the 
limit. 

2.2  Penalty  Functions 

The  function  h(X)  has  many  forms,  with  inverse  and  logarithmic  forms 
being  the  most  popular.   It  is  possible  to  use  a  weighted  penalty  function  that 
combines  the  two  approaches  of  the  sequential  unconstrained  method.   This  is  dis- 
cussed later  in  the  report. 

A  logarithmic  penalty  function  has  the  form 


m 

H(W  -  fk  -  pk  .Vn  gi(V 

1=1 


in  the  revised  function.   Taking  first  derivatives,  find  the  gradient  of  the 
H  function 
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m 
1=1 


It  is  apparent  that  the  gradient  vanishes  provided  )C  =  X  and  p  =  0.   This 
assumes  the  Kuhn  and  Tucker  condition  that  the  gradient  of  a  function  vanishes 
at  the  optimum  or  f^  =  0.   If  this  assumption  holds,  the  first  order  sufficiency 
condition  for  the  logarithmic  function  is  proved.   The  second  order  condition  is 
proved  in  a  similar  fashion.   Proceeding  to  the  second  derivatives,  we  have 

v^H(xk,pk)  =  v  f  -  Z  [pj/g^)]  v  g.(xk)  +  Z  [i/gi(xk)  ]  v  gi(xk)  vgi(xk). 

i=l  i=l 

2 
A  second  order  condition  for  the  optimum  is  that  the  matrix  V  f*  he  positive 

#  2 

definite.   As  p,  ■•  0,  X  -»  X  ,  and  the  H  function  becomes  equal  to  V  f^j  there- 

?   * 
fore,  V  H(X  ,0)  is  a  positive  definite  matrix.   This  completes  a  second  order 

-ondition  for  the  logarithmic  penalty  function. 

These  proofs  demonstrate  the  nearness  of  the  unconstrained  function  H 

to  the  minimum  of  the  objective  function  f  in  the  constrained  problem,  provided 

a  logarithmic  penalty  function  is  utilized. 

A  second  penalty  function,  called  the  inverse  function,  has  the  form 


m 

H<W  ■  fk +  pk  .V/gi(V 

i=l 


It  can  be  shown,  in  an  analysis  similar  to  the  previous  function,  that  first 
and  second  order  conditions  are  satisfied.   The  first  derivative  is 


m  p 

V  H(Xk,pk)  =  V  fk  -  pk  ^iVgi(Xk)/g.2(Xk) 
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so  the  H  function  has  a  vanishing  gradient  at  X  .   The  limiting  value  of  the 
sequence  of  solutions  to  the  unconstrained  problems  is  the  solution  of  the  con- 
strained problem. 

Note  that  the  penalty  function  increases  without  bound  as  the  con- 
straints approach  zero,  when  p,  is  greater  than  zero.  Hence,  the  boundary 
cannot  be  reached  and  the  equality  constraints  cannot  be  satisfied,  so  the  pro- 
cedure halts  at  a  delta  closeness  to  the  optimal  solution. 

The  inverse  penalty  function  would  appear  to  be  the  more  profitable 
of  the  two  functions  to  use.   The  calculation  of  the  inverse  of  a  constant  is 
trivial  and  the  summation  is  easy.   Although  the  computation  of  the  logarithm 
is  simple,  it  does  require  more  operations  and  consequently  longer  computation 
time  for  the  same  number  of  unconstrained  problems  in  the  solution  sequence.   No 
information  is  available  as  to  which  penalty  function  yields  the  shortest  solu- 
tion sequence  in  actual  problems. 

2.3  Interior  Points 

An  example  is  presented  so  that  the  reader  can  gain  an  understanding 
of  the  sequence  method.   It  also  helps  demonstrate  the  difference  between  the 
interior  and  exterior  points  methods. 

The  problem  is  to  minimize  the  objective  function 


w  =  f (X)  =  xx  +  2x2 


subject  to 


Xl  "  X2  -  ° 


x±  >   0 
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xx  -  x2  >  0 


iA  V2  3A  i  1 

Figure  1.   An  illustrative  example  of  the  interior  points  approach. 


The  revised  objective  function  is 


H(Xk,pk)  =  x±   +  2x2  -  pkln[g1(Xk)]  -  pjInCg^)] 
=  xx  +  2x2  -  Pkln(xx  -  x2)  -  pkln(x1)  . 


Figure  1  demonstrates  the  problem  at  hand,  the  shaded  area  being  the  feasible 
region,  and  the  boundaries  the  equality  constraints.   The  trajectory  of  current 
solutions,  which  are  listed  in  Table  1  and  plotted  in  the  figure,  shows  how  the 
boundaries  are  avoided  and  how  the  interior  is  utilized.   The  revised  function 
is  first  minimized  for  p  =  1  by  taking  partial  derivatives  and  setting  the  sys- 
tern  of  equations  to  zero.   The  penalty  constant  is  lowered  to  one  half  of  the 
previous  constant,  and  the  new  revised  function  is  minimized.   This  process 


Ill- 
continues  until  the  path  of  the  current  solutions  converges  upon  the  origin 
(0,0)  and  halts  within  a  5-closeness.   The  reduction  in  the  penalty  was  arbi- 
trarily chosen.   The  percent  of  reduction  is  large  because  of  the  simplicity  of 
the  problem  and  the  speed  of  convergence.   Note  that  the  path  remains  in  the 
interior  and  the  rate  of  convergence  slows  as  the  optimum  is  approached.   This 
method  is  called  the  interior  points  approach  for  obvious  reasons. 

TABLE  1 
THE  SOLUTION  OF  THE  INTERIOR  PROBLEM 


k 

Pk 

71 

72 

fk 

1 

1 

•  833 

.667 

2.16 

2 

•  5 

.416 

•  333 

1.08 

0 

•  25 

.208 

.167 

0.5^ 

k 

•  125 

.104 

.083 

0.271 

The  computations  of  the  minimum  of  the  revised  function  are  usually 
more  involved  because  the  derivatives  are  harder  to  find,  and  the  system  of 
equations  is  less  straightforward;  therefore  another  method,  such  as  one  of  the 
conjugate  direction  techniques,  is  used. 


2.k     Exterior  Points 

Thus  far,  the  method  (SUMT)  discussed  in  this  chapter  has  dealt  only 
with  the  interior  points  approach.   Another  approach  is  to  seek  the  optimum  from 
an  infeasible  current  solution.   This  is  called  the  exterior  points  method. 

The  idea  is  to  add  positive  slack  variables  to  the  constraints  which 
do  not  meet  the  predefined  inequality  conditions.   This  addition  takes  the  form 
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6i(V  +  \  =  °  x  - i  - m 


in  which  the  variable  q  decreases  as  the  solution  sequence  progresses.   The 

K 

constraints  which  affect  the  results  are  these  relaxed  inequalities.   The  new 
objective  function 


H(Xk,pk)  =  f  +  pk  L   min[0,gi(Xk)]  , 

i=l 


called  the  quadratic  loss  function,  accounts  for  this  relaxation.  The  penalty 
function,  p  ,  is  monotonically  increased  as  the  sequence  of  unconstrained 
problems  is  solved.   Since  as  p  increases,  X,  is  drawn  towards  the  feasible 
region  so  as  to  minimize  the  value  of  the  penalty  term,  X  ->  X  as  p  ->  °°. 

Each  step  consists  of  solving  the  set  of  equations  formed  by  setting 
the  gradient  of  the  new  objective  function  to  zero,  i.e., 


m 
V  H(Xk,pk)  =  V  fk  +  2pk  I  V  gi(Xk)  •  min[0,g;L(Xk)]  =  0. 

i=l 


The  second  order  condition  is  also  fulfilled  because  the  second  derivatives 
form  a  positive  definite  matrix  as  in  the  logarithmic  function  discussed  above 
X  is  a  local  minimum  of  the  converted  function  H(X,  ,  p,  )• 

The  difference  between  the  exterior  method  and  the  interior  method 
arises  because  of  the  form  of  the  revised  objective  function.   The  interior 
method  uses  all  constraints  in  each  iteration,  but  the  exterior  method  calcu- 
lates only  those  constraints  which  do  not  fulfill  the  proper  inequalities. 
The  optimum  of  the  converted  H  function  moves  toward  the  optimum  of  the 
constrained  problem  in  both  cases. 
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An  example  of  the  exterior  method  is  to  minimize 


w  =  f (X)  =  x1  x2 


subject  to 


2 
C2 


gl(x)  =  -xj  -x:  +  k  :>  o 


g2(X)  =  -xj  -x2  +  1  > 
g3(X)  -  x±  +   x2  >  0 


X,  +  X2>0 


Figure  2.   An  illustrative  example  of  the  exterior  points  approach. 


x7 
The  shaded  area  is  again  bhe  feasible  region  with  the  equality  conditions  the 
boundaries.   The  revised  objective  function  with  p  equal  to  .5  '■• 


2  2  2 

H(X  ,.5)  =  rQ  +  1  X{[min(0,x1  +  xg)]   +  [min(0,l  -xQ    -^    )] 

+  [min(0,  -x1  -Xg  +  h)]   }      . 


The  solution  of  this  function  is  listed  in  Table  2  with  the  other  solutions  in 
the  sequence.   The  penalty  constant  p  is  decreased  in  successive  iterations  to 
bring  the  current  solution  closer  to  the  feasible  region.   The  trajectory  of 
solutions  is  plotted  in  Figure  2. 


TABLE  2 
SOLUTIONS  OF  THE  EXTERIOR  PROBLEM 


k 

pk 

xl 

x2 

fk 

1 

1 

•  67 

.89 

.60 

2 

2 

.62 

•  77 

.k6 

3 

k 

.61 

•  73 

M 

h 

8 

•  58 

.67 

.38 

00 

00 

•  577 

.667 

•  385 

The  computations  of  the  minimum  of  the  revised  function  for  the  exte- 
rior method  is  less  involved  than  the  interior  method  in  many  cases.  Note  that 
in  the  example  the  current  solutions  fulfill  the  proper  inequalities  for  con- 
straints 1  and  3-  Hence  these  constraints  are  set  to  zero  in  the  revised  func- 
tion while  the  conditions  are  satisfied.  The  order  of  the  polynomial  is 
increased  by  the  quantity  squared,  which  is  a  disadvantage  from  a  computational 
standpoint. 
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The  exterior  method  is  quite  useful  when  the  minimum  of  the  first 
iteration  X  is  close  to  the  feasible  region  and  the  solution  occurs  on  the 
boundary  of  the  region.   The  convergence  may  be  prohibitively  slow  in  other 
situations.   The  exterior  algorithm  handles  equality  constraints  more  effec- 
tively than  the  interior  method,  because  the  slack  variables  are  taken  care  of 
in  the  quadratic  loss  function. 

The  difference  between  the  interior  and  exterior  methods  is  that  the 
interior  method  uses  the  constraints  to  avoid  regions,  while  the  exterior  method 
follows  the  philosophy  that  the  constraints  keep  the  current  solution  close  to 
the  feasible  region.   Although  the  ideas  are  different,  the  calculations  are 
similar,  except  for  the  number  of  constraints  in  the  revised  function. 


2. 5  Unconstrained  Algorithms 

An  integral  portion  of  the  sequential  method  is  the  minimization  of 
the  unconstrained  problem  using  the  objective  function  H.  Without  a  good  tech- 
nique the  whole  idea  falters.   What  constitutes  a  good  technique?  It  must  have 
good  convergence  properties  and  be  computationally  straightforward.   Global- 
local  difficulties  and  saddle  points  must  be  avoided.   These  problems  resemble 
those  of  the  constrained  problem.   A  multitude  of  algorithms  appear  in  the  liter- 
ature and  have  been  applied  with  varying  degrees  of  success.   The  most  popular 
are  the  method  of  steepest  descent,  the  conjugate  direction  methods,  and  Newton's 
second  order  method. 

The  first  class  of  methods  to  be  examined  is  the  conjugate  direction 
approach.   The  approach  guarantees  to  locate  the  minimum  of  a  function  in  n 
steps,  provided  the  function  is  quadratic  and  possesses  n  arguments;  otherwise 
a  test  of  convergence  is  required.   The  idea  is  to  approximate  the  function  by 
a  quadrat ir:  equation  and  search  along  this  function  until  the  minimum  is  deter- 
mined.  A  subset  of  this  method  is  the  Fletcher  and  Reeves  [7]  proposal.   Their 
algorithm  begins  with  the  vector  (X,  ),  where  k  is  initially  zero,  and  the 
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■t  ion 


DK  *  V  h 


The  next  point  (X   )  is  found  by  determining  the  minimum  value  of  f(X)  along 
this  direction.   When  this  point  is  found,  a  new  direction  is  computed  by  the 


D,  i  =  -V  f,  ,  +  €.  D, 

k+1      k+1    k  k 


where 


£K  "  V  fk+l2/^  fk2 


Tliis  process  continues  for  n  iterations  until  X  ,  the  solution  of  the  quadratic 
problem,  is  found.   An  assumption  is  that  the  calculations  involve  no  roundoff 
or  truncation  errors.   The  generated  directions  correspond  to  the  quadratic 
approximations  and  the  procedure  should  be  restarted  every  n+1  steps  for  general 
functions.   The  principle  quantities  to  be  observed  are:   the  rate  of  conver- 
gence, the  choice  of  the  initial  solution,  the  search  for  the  next  current  solu- 
tion, and  the  test  for  delta  closeness  to  the  optimum. 

Another  conjugate  direction  method  is  the  Davidon  variable  metric 
method  [2].   The  notion  is  to  approximate  the  second  partial  derivatives  in  the 
Hessian  matrix.   It  begins  with  the  arbitrary  vector  X  and  any  positive  definite 
matrix  A.   Successive  points  are  found  through  the  equation 

\+i  -  \  -  ^  fk 

The  step  length  d  is  found  in  a  one  dimensional  search  for  the  local  minimum  of 
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the  objective  function.   The  difference  from  the  previous  method  is  in  the 
matrix  approximation 


with  the  difference  equations 


and 


^  -  -d\v  fk 


^k   ■  *Vl  -  *V 


The  matrix  A,    is  positive  definite  if  A  is  also  positive  definite,  and  the 
function  decreases  appropriately. 

The  advantage  of  the  conjugate  direction  methods  lies  in  the  quadratic 
approximation  to  the  function.   The  method  works  quite  well  for  functions  which 
resemble  quadratic  forms,  but  not  so  well  with  general  functions.   If  the  initial 
point  is  sufficiently  close  to  the  optimum,  the  rate  of  convergence  is  good.   The 
calculations  of  A  are  somewhat  involved.   In  some  cases  it  is  much  better  than 
computing  second  derivatives. 

The  method  of  steepest  descent  is  the  second  type  of  method  to  be  dis- 
cussed.  It  is  an  offshoot  of  the  generalized  gradient  method,  which  points  in 
the  direction  of  greatest  change  of  the  function.   The  negative  of  the  gradient 
is  again  used.   Distance  along  this  gradient  is  maximized  to  lower  the  number  of 
iterations.   The  arbitrary  vector  I  is  used  once  more  to  begin  the  procedure. 
Subsequent  points  are  found  by 
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*kn'h- dv  f* 


where 

d  -  the  step  length 
k  +  1  =  the  new  current  solution 

The  step  length  is  found  by  determining  the  smallest  value  of  d  which  minimizes 
the  objective  function  along  -  Vffc  but  remains  positive.   That  point  becomes 

the  next  current  solution  and  a  new  direction  is  found  that  is  orthogonal  to 

the  previous  direction. 

The  final  method  to  be  examined  is  the  Newton  second  order  method. 
Second  partial  derivatives  of  the  unconstrained  H  function  are  required.   The 
method  is  based  on  the  second  order  Taylor  series 


W  =fk  +  fi!CVfk+  i^'hP* 


where 


\  = 


b   f.  fix  fix.. 

k7   1  ] 


d  f .  fix   dx, 

k'   n  1 


d  f,  fixfix 


k'   1  n 


d^f,  fix   dx 


k'   n  n 


and  L^.  =  L()C  )  is  the  Hessian  matrix.   The  A,  matrix  of  the  Davidon  variable 
metric  method  is  an  attempt  to  duplicate  this  matrix.   It  is  a  positive  definite 
symmetric  matrix.   The  series  of  current  solutions  is  represented  by 


-1. 


\+l  '  \  -  «*  v  f: 
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where 

L  Vf,  =  direction  of  step 

d  =  step  length  ,  chosen  to  minimize  f  along 

-L  V  f,  starting  at  x,  • 
Note  that  the  inverse  of  the  Hessian  matrix  must  exist  for  the  method  to  con- 
tinue.  Also,  the  matrix  must  be  stored,  which  requires  ^-N(N  +  l)  words.   If 
the  derivatives  are  troublesome  to  compute,  the  method  slows  considerably. 
Nonetheless,  the  rate  of  convergence  is  as  good  as,  if  not  better  than,  the 
earlier  two  methods.   Actual  tests  have  been  run  by  Fiacco  and  McCormick  [k~\ 
and  the  results  indicate  that  the  Newton  method  requires  the  fewest  iterations, 

o 

but  the  work  per  step  is  roughly  n  /3,  while  the  variable  metric  method  requires 

2 
n  operations  per  step  and  Fletcher-Reeves  method  requires  n  /2  operations  per 

step.   These  results  indicate  that  the  Newton  method  requires  many  more  compu- 
tations per  step  but  fewer  iterations  are  required  for  convergence;  therefore, 
it  compares  favorably  with  the  other  methods.   Because  of  the  structure  of 
ILLIAC  IV,  these  characteristics  make  the  Newton  method  the  preferred  technique. 
Chapter  k   expounds  on  this  issue  in  more  detail. 

2.6  Advantages 

In  Chapter  2  the  sequential  unconstrained  minimization  algorithm  has 
been  examined.   To  summarize,  the  advantages  of  the  method  are  listed  below. 

1.  The  algorithm  is  computationally  concise,  yet  convergence  has  been 
proven  for  first  and  second  order  conditions. 

2.  The  sequence  of  current  solutions  remains  in  the  interior  for  the 
interior  points  method.   This  avoids  unwanted  local  minima  to 
some  degree. 
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3.  Highly  nonlinear  I  attaints  can  be  hand]  I  because  the  path  of 
solutions  does  not  attempt  to  follow  the  constraints.   The  path 
avoids  the  constraints  through  the  use  of  the  penalty  constant 
and  the  penalty  function. 

k.      It  includes  a  useful  unconstrained  technique  within  the  computa- 
tional algorithm. 

5.   A  large  variety  of  problems  can  be  solved. 

sse  conditions  suggest  that  the  method  be  used  in  a  combined  algorithm,  its 
principle  task  being  to  serve  as  a  first  approximation  to  the  gradient  projection 
method. 

The  principle  disadvantages  stem  from  equality  constraints.   It  should 
be  apparent  that  these  constraints  can  be  satisfied  only  in  the  limit  for  both 
the  interior  and  exterior  methods.   Something  must  be  done  to  overcome  this 
difficulty.   The  second  disappointment  is  the  slowness  of  convergence  for  general 
rules  which  affect  the  penalty  constant.   This  can  be  overcome  by  a  priori  knowl- 
edge of  the  problem;  however,  a  problem  solving  system  should  avoid  user  mani- 
pulation as  much  as  possible.   The  formulation  of  a  good  extrapolation  procedure 
has  not  been  achieved. 


2k 
3.   THE  PROJECTED  GRADIENT  METHOD 

3*1  Review  of  the  Problem 

The  second  portion  of  the  combined  method  is  taken  from  the  algorithm 
iosen  [12].   The  first  account  of  this  algorithm  appeared  in  a  paper  which 
dealt  with  linear  constraints,  and  this  was  followed  by  the  subsequent  paper  on 
nonlinear  constraints.   Hadley  [8]  includes  this  method  in  his  book  on  nonlinear 
programming.   The  gradient  projection  method  is  one  of  the  techniques  of  feasible 
Lirections.   It  attempts  to  move  the  solution  in  the  direction  of  the  negative 
gradient,  similar  to  the  procedure  of  the  unconstrained  method  of  steepest  de- 
scent.  Differences  occur  when  the  gradient  of  the  objective  function  points 
outside  the  feasible  region.   Following  the  gradient  outside  the  region  leads  to 
infeasibility,  so  other  directions  are  searched  to  find  the  "next  best"  direc- 
tion. 

3-2  The  Algorithm 

1.  The  starting  vector  can  be  found  in  two  ways.   It  can  be  determined 
either  by  the  sequential  unconstrained  minimization  method  or  by  the  projected 
gradient  method.   This  is  examined  in  more  detail  in  the  next  chapter. 

2.  If  the  starting  vector  is  interior,  the  current  solution  is  found 
by  the  formula 


xi  ■  xo  -  d  v  f0  > 


where  the  step  length  d  is  determined  by  a  one -dimensional  search.   Find  the 
step  length  which 


minimizes  f 
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subject  to 


0  <  d  <  d    , 

—   —  max 


where  d    is  the  longest  step  length  which  remains  in  the  feasible  region. 

3.   If  X,  lies  within  a  & -neighborhood  of  the  intersection  of  the  s 

K 

constraints,  then  the  solution  is  moved  in  the  direction  of  the  projection  of 
the  gradient  Vf,  on  the  intersection  G  of  the  tangent  hyperplanes  at  )C.   Deter- 
mine the  matrix 


uk  =  (vgl,  vg2,  .  .  .  ,  vgs)  . 


The  columns  of  this  matrix  must  be  linearly  independent  and  the  procedure  checks 
for  this  condition;  otherwise  dependent  columns  are  removed  until  the  remaining 
columns  are  independent.   Next  define 


\  ■  [\'\yl> 


which  exists  because  of  the  independence  condition  above,  and 


R 


k  k   k 


The  projection  of  the  gradient  on  the  intersection  G  of  the  tangent  hyperplanes 
at  I  is  then 

V^  ■ w  k  -  UA 
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where 


\  - x  -  \\\ 


The  proof  of  this  vital  equation  is  contained  in  the  following  section.   The 
test  for  the  optimum  is  started  by  finding 


P..  =  r..  /(2  sTv..) 
lk    ik/v    11 


where  r   is  the  i-th  element  of  the  vector  R  and  v. .  are  the  diagonal  elements 

of  the  matrix  V,  .   Define 
k 


Finally,  if 


a  =  max  [  II  PkVfk  || ,  max  p±k] 

i 


a  <  b/ 2s lii 


where 


6  =  specified  tolerance 

s  =  number  of  constraints 

I  =    a  bound  on  the  diameter  of  the  feasible  region 

U  =   measure  of  independence  of  the  active  constraints,  ||  V-.  f |  <  j/ 


then  f  is  within  a  6-distance  of  the  local  minimum. 


h.      If  a  <  S/2s£u   and  ||  P  Vf  |  =  max[p  ],  then  find  the  unit  vector 


k  k 


ik" 


"k-Pkvfk/l|Pkvfk 
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Now  compute  the  series  of  points  from  Y,  to  Y  which  projects  the  current  solu- 


tion back  into  the  feasible  region  by 


+  dN, 


Y„  = 


Yl  -  W  V 


Y  =  Y  .  -  u,  vn  w(y  . ) 

n    n-1    k  k  v  n-ly 


wh 


ere  w(Yi_1)  is  a  vector  which  is  a  measure  of  the  distance  to  the  intersection 


The  n-th  solution  is  found  at  a  5 -neighborhood  of  the  intersection  G,  and  the 
iteration  terminates. 


X2 

A 


(0,0) 


Figure  3.   The  correction  of  the  projected  gradient  back  into  the 
feasible  region. 
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5.   If  a  <  5/2si/Lz  and  |  P.Vf.  |l  <  max  3   then  drop  the  i-th  column 
of  the  U  matrix  which  corresponds  to  the  element  max  (3   .   The  new  matrix 

1 

becomes  U    and  the  process  is  continued  by  finding  the  new  V   ,  R   ,  P   , 
Vf  ,,  and  N    matrices  and  vectors.   The  notion  is  to  proceed  along  a  new 
intersection  G  by  dropping  the  appropriate  constraint. 


X2 

A 


,— CONTOURS  OF  THE 

x\  x„  x)0  x9  OBJECTIVE  FUNCTION 

X.3 


— -X| 


Figure  h.      Dropping  a  constraint  in  the  Projected  Gradient  Method. 

Figure  k   shows  how  this  movement  occurs  geometrically.   At  solution  X, »  the  path 
switches  to  intersection  G,  which  is  constraint  1  in  this  example,  from  inter- 
section G,  which  is  constraint  number  2.   The  constraint  is  d.  _pped  from  consid- 
eration because  the  optimum  occurs  at  X  ,  away  from  the  equality  condition  of 
constraint  2.   Point  X   is  not  a  stationary  point  because  the  objective  func- 
tion  can  be  decreased  by  moving  along  intersection  G. 

This  completes  the  explanation  of  a  typical  loop  in  the  projected 
gradient  method. 
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.  3  Proof  of   the  Project  id  .iradienl  Formula 
The  formula  in  question  is 

pkVfk  ■  Wlk  -  \\ 

where  the  notation  used  has  been  described  in  the  previous  section.   This 
equation  is  crucial  to  the  method  because  it  is  the  distinguishing  factor 
between  this  method  and  other  methods  of  feasible  directions,  i.e.,  it  is  a 
projection  of  the  gradient  onto  the  intersection  of  the  s  constraints.   It  is 
also  part  of  the  test  for  a  local  minimum.   A  necessary  condition  for  the  opti' 
mum  is  that 


V*  k  "  ° 


which  follows  automatically  when 


Vf .  =  0  . 
k 


A  sufficient  condition  is  that  r..  <  0  for  I  =  1.  ....  s  where  r..  is  the  i-th 

lk  '  '  ik 

element  of  the  vector  R.   Combining  these  conditions,  one  knows  that  a  move  in 
any  direction  fails  to  decrease  the  objective  function. 

The  first  step  in  verifying  the  equation  for  P  is  to  let 


E  =  E   +  E 
s    u    ju 


where 


E  =  Euclidian  s  space 

E  =  Sub space  spanned  by  the  columns  of  U, 

E  =  Sub  space  spanned  by  vectors  orthogonal  to  the  columns  of  U,  . 

JU  .K 


Now  let 


where 
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Vf ,  =  c  .  +  c  , 

k    uk    )ik 


C   =  component  of  the  gradient  in  Q,  which  is  the  tangent  to  the 
intersection  of  hyperplanes  at  X^, 

C  ,  =  orthogonal  component. 


X2 


X| 


Figure  5.   Representation  of  the  projected  gradient  formula. 


The  illustration  in  Figure  5  depicts  the  vectors  geometrically. 
Now 


Vf  =  C  .  +  C  ,   ,  C.eE   ,  C.eE 

k    uk    nk  '   uk   u  '   nk   u 


But  since  E  is  generated  by  the  columns  of  U 


C  ,  =  U. 


uk  ■  \\  ■ 


The 


re  CL    is  a  vector  of  coefficients.   Since  E  is  the  orthogonal  complement 


of  E  , 
u' 


u.  *c  .  =  0 

k  uk 


Hence 


\  OTk  ■  uk  "A 


\   •    ("kV"1  V^k 


since  the  inverse  exists.   It  follows  that 


uk     k    uk 


-V** 


where 


pk  - J  -  vVV"1  uk' 
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3-^     An  Example  Problem 

The  sample  problem  which  demonstrates  all  of  the  above  principles  is 
to  minimize  the  objective  function,  -3x, -x  ,  subject  to  the  constraints 

gl(X)  =  -x^/h   +  2x1  -x2  >  0 

g2(X)  =  x^  -  k  >  0 

g3(X)  =  x±  >   0 

gk(X)  =   x2  >  0 

figure  6  represents  the  problem  with  X  ,  the  initial  solution,  at  [2,3] •   It 
happens  to  be  on  a  boundary,  which  eliminates  the  first  step  in  the  algorithm. 
Note  that 


Wk  =  [-3,-1] 


Vg;L(X)  =  [-x.j/2  +  2,  -1] 


Vg2(X)  =  [x2,Xl]  . 


The  calculations  of  the  first  move,  which  are  included  in  Appendix  A,  indicate 
that  the  current  solution  is  not  a  local  optimum  because  the  projected  gradient 
is  not  zero.   The  direction  of  the  step  length  is  P-,Vf  and  the  path  moves  along 
this  direction  with  the  arbitrary  step  length  d  =  2.   Figure  6  shows  the  neces- 
sary vectors. 


g,(X)=o 


2    3    4    5    6    7    8 


X| 


Figure  6.   A  sample  problem  of  the  Projected  Gradient  Method. 

A  constraint  must  be  added  with  this  step  length.   These  calculations  are  also 
contained  in  the  Appendix.   The  solution  is  iterated  back  to  the  feasible  region. 
The  second  current  solution  is  found  by  this  process  to  be  (1.578,2.53*+)  and  the 
constraints  are  both  equal  to  zero.   The  computations  of  the  projected  gradient 
at  this  point  are  made,  and  the  result  is 


P2Vf2  ||  =0.0 


R0  =  (-0.^9527,-0.91+721+) 


Because  the  projected  gradient  is  zero  and  the  R  vector's  elements  are  negative, 
the  current  solution  is  a  local  optimum,  which  is  Xp.   Figure  7  illustrates  the 
local  conditions  for  optimality. 


3^ 


X2 


Figure  7.   The  optimum  in  the  Projected  Gradient  Method. 

3. 5  Step  Length 

The  choice  of  a  step  length  search  procedure  depends  upon  the  position 
the  current  solution.   If  it  is  in  the  interior,  a  simple  one -dimensional 
search  is  performed.   The  objective  function  to  be  minimized  is  f (X)  along  the 
direction 


where 


0  <  d  <  d 
—   —  max 
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Fhe  optimum  step  length  is  quite  easy  to  find  provided  the  function  is  unimodal. 

a  is  true  even  if  the  function  is  complicated  to  calculate.   Problems  arise 
when  the  function  is  multimodal,  and  these  are  explained  in  the  following  chapt. 
The  curvature  of  the  constraints  determines  the  length  of  the  step  if 
the  current  solution  lies  within  a  S-neighborhood  of  the  boundary.   A  tradeoff 
exists  between  the  length  and  the  time  required  to  project  back  into  the  feasible 

;on.   The  distinct  possibility  of  never  returning  to  the  feasible  region  exists 
for  highly  nonlinear  constraints,  especially  if  the  step  length  is  long.   Rosen 
[12]  suggests  that  a  good  step  length  is 


d  ■  <*W/«  * 


where 


6   =  unit  of  independence  of  vectors  of  U 

3    =  maximum  of  6.. 
max  ^ik 

0  =  measure  of  curvature  of  the  active  constraints. 


One  determines  the  matrix 


\  -  E<V  ■ 


32g1(xk)/5x1Sx1     .  .  .     a2g1(xk)/aXlaxi 


^'W^fri    ■  ■  •    ^.(V/3*,**, 


and 


\W 
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where  s  is  the  number  of  constraints  in  the  intersection.   This  particular  step 
length  is  one  of  the  most  popular,  but  there  are  many  alternatives. 

An  arbitrary  constant  step  length  could  be  used  if  the  second  deriva- 
tives are  expensive  to  compute.   The  time  taken  to  achieve  convergence  would  be 
increased.   Brown  [l]  arrives  at  the  constant  i/50  in  which  I   is  the  length  of 
the  longest  vector  in  the  feasible  region.   If  the  optimum  happens  to  be  between 
the  intervals,  two  things  are  possible.   The  function  can  be  interpolated  and 
the  answer  approximated,  or  the  grid  size  is  refined  again  and  again  until  the 
proper  accuracy  is  achieved.   The  latter  case  appears  to  be  the  better  because 
termination  occurs  at  the  desired  precision.   It  halts  at  5-closeness  to  the  mini- 
mum.  For  these  reasons  it  is  used  in  the  combined  algorithm. 

3.6  Zigzagging 

All  gradient  methods  have  major  disadvantages  when  the  contours  do  not 
form  well  conditioned  hills,  i.e.,  when  ridges  and  valleys  occur.  Ridges  are 
troublesome  to  define  except  in  a  vague  sense  because  a  change  of  scale  affects 
the  conditioning  of  the  hill.   One  way  to  categorize  ridges  is  by  the  difficulty 
of  descent  or  the  rate  of  convergence.  Difficulty  of  descent  is  determined  by 
the  size  of  the  arc  of  a  circle  which  encompasses  directions  which  result  in  a 
decrease  of  the  function  for  a  given  step  length.  Usually  a  small  step  length 
makes  movement  easier,  and  a  variable  step  length  helps  handle  ill-conditioned 
hills.   This  is  another  advantage  of  the  variable  step  length.  Figure  8  shows 
the  trouble  with  climbing  a  hill  with  a  gradient  technique.   Zigzagging  is  demon- 
strated.  In  some  problems  the  procedure  runs  thousands  of  iterations  without 
noticeable  improvement.   A  change  of  scale  could  be  made  to  round  the  ridges,  but 
the  axis  of  the  hill  must  be  known.   This  is  seldom  the  situation.  A  good  tech- 
nique is  to  shorten  the  optimum  step  length  by  0.9  and  use  this  length. 
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Figure  8.   An  example  of  zigzagging. 

Overshoot  is  lessened.   Another  ridge  correction  technique  is  to  approximate  the 
ridge  axes  by  a  quadratic  equation.   If  the  second  derivatives  are  known,  this 
method  accelerates  movement  along  the  ridge. 

These  methods  are  not  cure-alls  for  zigzagging;  they  are  merely 
attempts  to  speed  convergence. 


3«7  Convergence 

The  gradient  projection  method  is  proven  to  converge  to  a  local  mini- 
mum if  certain  conditions  are  fulfilled;  however,  an  inordinate  number  of  itera- 
tions is  sometimes  required.   This  situation  is  clearly  unacceptable.   One  way 
this  happens  is  by  the  accumulation  of  roundoff  and  truncation  errors.  For 
example,  the  length  of  the  projected  gradient  might  never  reach  zero,  and  a 
reasonable  approximation  must  be  made.   The  procedure  terminates  at  a  5-close- 
ness.  Another  method  for  the  determination  of  the  local  optimum  is  to  test  the 
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rate  of  convergence 

*  -  (fk  -  fk+iVfk 

and  halt  below  a  predetermined  amount.   The  projected  gradient  method  uses  this 
idea  in  the  a  test.   It  is  the  usual  scheme  when  the  solution  occurs  in  the 
interior  of  the  feasible  region. 

What  happens  to  zigzagging  when  a  test  for  the  optimum  is  made  by  a 
test  of  convergence?  Perhaps  the  path  of  solutions  is  climbing  a  ridge  or  is 
stuck  on  a  peak.   Even  worse,  the  current  solution  could  be  alternating  around 
the  optimum  in  some  random  fashion. 

Hadley  [8]  utilizes  an  approximation  to  speed  convergence.  A  second 
order  Taylor  series  is  used 


f(x)  =  f  k  +  Vf^(X  -  Xk)  +  |  (X  -  Xk)'  Lk(X  -  Xk) 


where  L  is  the  Hessian  matrix.   The  estimate  for  the  stationary  point  is  found 
by  setting  the  gradient  equal  to  zero  in  this  approximation.   The  following 
system  of  equations  is  the  end  result 


-1 

^k  ~k 


xk,, 


The  answer  is  found  by  solving  this  system.   This  method  is  used  only  in  the  last 
few  iterations  because  it  is  computationally  involved. 

The  rate  of  convergence  is  not  a  clear  concept.   One  never  knows  if 
the  process  will  accelerate  just  around  the  next  bend.  An  analogy  can  be  drawn 
with  the  driver  of  an  automobile  who  is  attempting  to  pass  a  truck  in  the  moun- 
tains.  Should  he  risk  obliteration  (infeasibility)  by  passing  on  a  curve?  Per- 
haps it  is  more  profitable  to  ride  the  curves  out  in  the  hope  of  nearing  a 
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straightaway.   The  answers  to  these  questions  are  similar  to  those  asked  in 
nonlinear  programming. 


k.      THE  CM  BUSED  METHOD 

^■.1  Discussion  of  the  Method 

The  methods  described  in  the  two  previous  chapters  are  now  synthesized 
into  a  working  system  to  solve  the  widest  possible  range  of  applications.  Both 
methods  are  applied  in  solving  most  problems  because  the  advantages  are  better 
utilized  in  this  fashion.  Use  of  the  combined  method  will  increase  the  probabi- 
lity of  attaining  the  global  optimum.   This  chapter  describes  the  system  as  a 
whole  with  the  aid  of  the  flow  diagram  in  Appendix  B.   The  first  diagram  is  a 
short  summary  of  the  Combined  Method,  which  should  be  easily  understood  at  this 
point;  the  details  of  the  algorithm  are  described  in  the  subsequent  flow  diagrams. 
The  individual  decisions  in  the  diagrams  are  fully  discussed,  as  are  the  particu- 
lar conditions  for  termination,  such  as  infeasibility,  optimality,  etc.   The  parts 
of  the  flow  diagram  which  involve  parallel  computations  are  mentioned  in  this 
chapter  and  are  described  in  the  following  chapter.   Otherwise,  the  description 
of  the  system  is  contained  below. 

As  in  the  simplex  method  of  linear  programming,  the  Combined  Method  is 
divided  into  two  phases.   However  Phase  I  is  only  concerned  with  determining  an 
interior  feasible  solution  instead  of  a  basic  feasible  solution.  When  an  array 
computer  is  used  in  the  computation,  multiple  interior  points  are  determined. 
Phase  ::  also  predicts  the  infeasibility  of  the  problem  if  it  is  unable  to  find 
a  solution  in  the  feasible  region. 

Phase  II  begins  with  the  interior  solutions  and  attempts  to  find  the 
global  optimum.   The  procedure  remains  within  the  feasible  region  unless  the  pro- 
jected gradient  points  outside  the  region,  in  which  case  a  correction  is  quickly 
rendered.   The  procedure  can  therefore  be  stopped  at  any  step  with  an  improved 
current  solution.  Many  local  optima  are  usually  found  which  help  determine  the 
global  minimum  by  a  comparison  of  them  and  choice  of  the  smallest.   The  number  of 
local  optima  calculated  is  proportional  to  the  probability  of  finding  the  global 
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optimum.   Hence  more  iterations  of  -he  procedure  gives  better  results  with  a 
trade-off  at  some  point,  at  which  the  increase  in  accuracy  is  equal  in  value  to 
the  cost  of  calculating  the  increase.   The  locking  test  defines  the  trade-off 
point  and  terminates  at  that  position. 

1+.2  Phase  I  of  the  Algorithm 

An  advantage  of  the  Combined  Method  is  the  lack  of  need  for  an  initial 
basic  feasible  solution.   A  feasible  point  is  needed,  however,  and  an  interior 
point  is  helpful.   Most  nonlinear  algorithms  apply  their  procedure  to  an  in- 
feasible  point  to  find  the  interior  initial  solution.   Both  techniques  of  the 
Combined  Method  can  employ  this  approach. 

Phase  I  is  crucial  to  the  entire  formulation  because  the  final  solu- 
tion depends  a  great  deal  upon  the  initial  solution.   Nonlinear  programming  is 
essentially  a  heuristic  search  which  claims  little  more  than  the  ability  to 
determine  a  stationary  point --usually  a  minimum  when  the  direction  employed  is 
the  negative  gradient.   The  inherent  uncertainty  surrounding  the  solution  can 
be  lessened  by  reapplying  the  procedure  at  different  initial  points,  but  the 
fact  remains  that  the  global  optimum  cannot  be  found  under  certainty  for  non- 
convex  problems.   Hence  many  optima  are  found  in  the  Combined  Method. 

The  author  contends  that  the  best  procedure  for  Phase  I  is  to  pierce 
the  feasible  region  as  much  as  possible,  with  the  initial  solution.  ■  The 
Sequential  Method  can  take  over  from  this  interior  point,  and  accuracy  is  im- 
proved because  the  current  solution  has  not  been  greatly  altered. 

One  begins  the  procedure  by  dividing  the  constraints  into  two  groups: 

(1)  large  separable  polynomials  approximations  of  the  constraints  and 

(2)  smaller  general  constraints.   The  two  groups  are  stored  in  different  ways 
so  that  the  parallel  capability  of  the  computer  is  realized.   The  process  is 
defined  in  detail  in  the  following  chapter.   In  either  case,  many  interior  solu- 
tions are  determined  in  Phase  I. 
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The  problem  in  Phase  I  is  tj  determine  a  vector  X  -which  satisfies 
the  constraints 


g.(X)  >  b.         i  =  1,  .  .  •  ,  m 
i      l 


or 


g.(X)  -  b.  >  0  . 


The  first  step  is  to  pick  many  arbitrary  initial  feasible  vectors  X  , 
by  any  process.   One  should  use  any  previous  information  in  picking  these 
vectors,  otherwise  they  can  be  chosen  by  a  random  number  generator,  within 
certain  bounds.   These  vectors  are  substituted  into  the  constraints  and  they 
are  divided  into  three  sets — J,  K,  L.   The  next  test  branch  in  the  flow  diagram 
is  needed  to  handle  the  different  forms  of  storage.  Multiple  optima  are  found 
in  two  different  fashions;  these  are  listed  under  number  7  and  8  in  the  flow 
diagram. 


Set  J  =  (a|g  (Xj  -  b   =  0} 


Set  K  -  {c|gc(X0)  -  bc  >  0} 


Set  L  =  U|gi(X0)  -  h£  <  0) 


The  procedure  forces  J  and  L  into  becoming  null  sets,  with  the  elements 
goin  g  into  set  K  so  that  the  conditions  are  fulfilled.   Select  a  series  of  solu- 
tions which  increases  the  first  constraint  of  set  L,  without  violating  the  con- 
straints  of  set  K.   Set  J  is  checked  as  the  procedure  continues  and  put  into 
set  K  or  L  appropriately.   Using  the  unconstrained  method,  minimize 
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In-HOC^)  =  -  Z  lefa)   -  b/]  +Pk  E  hc(yck) 

i€L  CGK 

which  is  called  the  infeasible  revised  objective  function.   The  negative  sign 
in  front  of  the  first  summation  is  necessary  because  one  wants  to  increase  the 
desired  constraint,  and  this  equation  is  equivalent  to  maximizing  a  positive 
quantity.   The  minimization  is  performed  in  two  separate  ways  depending  upon 
the  existence  of  second  partial  derivatives.   The  Newton  Method  is  used  if  the 
derivatives  exist;  otherwise  a  conjugate  direction  method  is  used.   The  flow 
diagram  accounts  a  multi -model  revised  objective  function  through  the  global 
test.   This  test  reiterates  the  procedure  for  another  minimum  if  it  is  not 
satisfied  with  the  solutions  at  that  point.   A  sequence  of  iterations  is  per- 
formed with  a  decreasing  p  .   The  sets  are  continuously  checked  for  the  proper 
inequality  during  the  sequence.   A  constraint  of  set  L  leaves  the  set  when  it 
becomes  positive  and  is  put  into  the  K  set.   The  process  halts  when  set  L  is 
null.   Set  J  is  probably  null  at  this  time  also.   If  it  is  not,  the  minimization 
is  performed  on  the  elements  of  set  J.   The  problem  is  infeasible  if  either  set 
L  or  J  have  elements  when  the  minimum  is  found,  provided  the  function  is  convex. 
Otherwise,  the  procedure  is  returned  many  times  to  test  multiple  starting  points 
through  the  test  until  it  is  satisfied  with  infeasibility.   At  some  point,  the 
answer  at  the  test  is  negative  and  the  algorithm  halts. 

The  procedure  assumes  that  the  equality  constraints  have  been  converted 
to  inequalities  by  the  formula  of  Chapter  1.   If  the  conversion  proves  trouble- 
some, the  equalities  can  be  ignored  in  Phase  I  but  must  be  reckoned  with  sepa- 
rately in  the  remainder  of  the  algorithm.   The  interior  and  exterior  methods  co- 
operate to  handle  equalities.   Once  multiple  interior  points  have  been  calcu- 
lated, the  procedure  switches  to  Phase  II,  which  improves  the  solutions. 
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4.3  Feasibility  Test 

The  previous  section  provides  a  test  of  feasibility  if  the  constraints 
are  convex,  and  it  can  be  applied  to  nonconvex  problems  through  many  iterations. 
The  author's  proposal  is  to  work  with  the  equality  constraints,  which  are  usually 
few  in  number.   One  limits  the  range  of  possible  answers  by  finding  a  solution  of 
these  equations.   This  method  is  costly  in  machine  time  and  should  be  used  only 
if  the  method  in  the  previous  section  fails  two  or  more  times. 

The  method  finds  any  solution  of  the  equation 

Z  [g.(X)  -  b.]2  =  0 
iej 

for  the  equality  constraints  by  minimizing  this  system.   At  the  minimum,  the 
final  vector  is  the  solution  of  the  system.   The  minimization  utilizes  any  un- 
constrained procedure,  such  as  Steepest  Descent  or  Newton  Second  Order  Method. 
Another  technique  is  to  approximate  the  function  by  any  interpolation  procedure. 
When  a  root  of  the  polynomial  is  found,  check  it  against  the  equality  constraints. 
If  the  desired  accuracy  is  not  achieved,  use  a  higher  order  interpolation  formula. 

This  procedure  is  not  entered  on  the  flow  diagram  because  the  first 
method  is  usually  satisfactory,  but  it  can  be  added  quite  easily  at  entry  point 
number  11  in  the  flow  diagram. 

k.k     Phase  II  of  the  Algorithm 

The  second  portion  of  the  algorithm  is  the  most  important  and  the 
longest  with  regard  to  machine  usage.   It  begins  with  the  multiple  interior 
point  solutions  and  attempts  to  find  the  global  optimum.   In  most  situations 
the  global  optimum  cannot  be  found  with  certainty  and  one  must  be  satisfied 
with  a  good  local  optimum.   The  locking  test  is  the  crucial  portion  of  Phase  II 


k5 

ause  it  determines  the  technique  ij  be  used  during  Phase  II.   It  also  pre- 
dicts the  final  stationary  points  and  termination.   A  discussion  of  the  test  is 
presented  in  the  following  section. 

The  first  portion  of  Phase  II  to  be  examined  is  the  Sequential  Uncon- 

■  lined  Method  and  it  is  located  at  entry  point  number  6.  Many  solutions  are 
found  during  this  portion  of  the  algorithm,  and  a  division  of  the  flow  diagram 
is  made  to  show  how  each  type  of  constraint  finds  these  solutions.   A  test 
divides  the  separable  polynomial  constraints  and  the  general  constraints.   The 
next  stage  is  to  find  the  penalty  constant  p  ,  which  very  often  is  1  for  the 

ri 

first  iteration.   For  the  problems  with  general  constraints  many  solutions  can 
be  found  simultaneously  whereas  a  sequential  loop  is  required  for  the  problems 
with  separable  polynomial  constraints.   Then  the  sequential  approach  is  selected 
and  the  penalty  function  is  determined.   These  are  fixed  for  all  solutions  during 
application  of  the  Sequential  Unconstrained  Minimization  algorithm.   The  revised 
objective  function  (functions  for  general  constraints)  is  formed  and  minimized. 
Many  local  minima  of  the  revised  objective  function  are  also  found  in  the  same 
fashion  as  above.   Finally,  the  penalty  constant  is  lowered  and  the  procedure 
is  reiterated  until  a  satisfactory  improvement  is  obtained.   The  Sequential  Un- 
constrained  Method  does  not  usually  find  the  final  stationary  solution  because 
the  Projected  Gradient  Method  has  faster  convergence  near  the  end  of  the  optimi- 
zation procedure. 

Many  similar  tests  are  performed  during  the  Projected  Gradient  portion 
of  Phase  II.   One  additional  test  is  for  interior  points  because  the  procedure 
uses  a  different  algorithm  depending  upon  whether  the  solution  is  on  the  boundary 
pr  in  the  interior.   Separable  polynomial  constraint  problems  again  are  handled 
jin  a  different  manner.   Many  optima  are  found  during  the  procedure  with  both 
portions  of  Phase  II.   Control  is  returned  to  the  locking  tests  at  this  point. 
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U.5  The  Locking  Test 

A  recurring  question  in  the  system  is  the  determination  of  the  proper 
technique  to  apply  in  Phase  II.  When  does  it  become  advantageous  to  switch  to 
the  Projected  Gradient  Method  instead  of  continuing  the  Sequential  Method?  An 
estimate  of  the  future  rate  of  convergence  is  made.   Obviously,  certainty  does 
not  enter  into  the  prediction,  and  no  exact  answer  exists. 

Each  method  works  better  for  various  problems.   Two  benefits  of  the 
Projected  Gradient  Method  are  the  existence  of  linear  constraints  and  of  few 
hyperplanes  in  the  intersection  at  the  optimum.   On  the  other  hand,  the  Sequen- 
tial Method  handles  nonlinear  constraints.   To  determine  the  appropriate  method 
to  be  used,  the  following  scheme  is  suggested. 

1.  Determine  an  index  which  categorizes  the  type  of  problem  according 
to  size  and  linearity  and  the  user's  preference.   This  index  runs  from  zero, 
which  means  a  few  very  nonlinear  constraints,  to  one  which  represents  a  large 
number  of  quasi-linear  constraints. 

2.  The  locking  test  is  made  at  various  iterations  to  determine  the 
rate  of  convergence.   A  part  of  the  test  is  made  with  the  equation 

'  -  (fk  '  fk+iVfk  ! 

which  tests  for  the  rate  of  convergence. 

3-   The  locking  test  keeps  the  current  technique  in  position  according 
to  the  index.   If  it  is  zero,  the  Sequential  Unconstrained  Technique  is  used 
alone.   If  the  index  is  1,  the  Projected  Gradient  Method  is  the  only  method 
employed  in  the  problem. 

h.      Indices  between  the  extremes  are  weighed  proportionately.   The 
algorithm  quickly  switches  to  the  Projected  Gradient  Method  when  the  index  is 
large  and  the  convergence  is  small.   Conversely,  the  Sequential  Unconstrained 
Method  is  used  often  when  the  index  is  small. 
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5.   The  locking  test  also  determines  the  final  stationary  optimum. 
It  must  analyze  the  previous  information  and  predict  the  termination  point. 

Global -Local  Conditions 


The  global-local  difficulty  has  been  mentioned  often  in  this  paper, 
nonconvex  problems  there  is  no  certainty  that  the  final  solution  is  the  global 
optimum.   For  many  types  of  problems,  this  condition  is  usually  not  serious  be- 
cause a  close  approximation  is  adequate.  Many  precautions  are  taken  in  the 
Combined  Method  to  overcome  this  problem.   For  example,  multiple  interior 
points  are  found  in  Phase  I  which  lead  to  many  local  optima  during  the  procedure. 
Even  in  the  minimization  of  the  revised  objective  function,  many  local  optima 
are  determined  to  increase  the  probability  of  finding  the  global  solution.   Few 
algorithms  have  the  flexibility  of  the  Combined  Method  built  into  the  system  for 
finding  the  global  optimum.   This  useful  capability  is  one  of  the  advantages  of 
the  Combined  Method. 

The  power  of  the  simplex  method  in  linear  programming  stems  from  the 
ability  to  locate  the  global  optimum  under  certainty.   Some  types  of  problems 
are  better  solved  by  linear  approximations  because  of  the  global  conditions  and 
;he  cutting  plane  method  uses  this  idea;  however,  the  approximation  leads  to 
Lnfeasible  solutions  in  many  situations.   Perhaps  nonlinear  programming  will  also 
tichieve  the  capability  of  the  simplex  method  someday.   Until  that  time,  methods 
-ike  the  Combined  Method  must  be  used. 
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5.   IMPLEMENTATION  ON  ILLIAC  TV 

5-1  Parallel  Computations 

The  purpose  of  this  chapter  is  to  present  some  of  the  difficulties 
encountered  in  actual  implementation  of  the  Combined  Method  on  an  array  computer 
such  as  ILLIAC  IV.   Solutions  for  some  of  these  difficulties  are  described  and 
alternative  suggestions  are  mentioned.   The  parallel  portions  of  the  computa- 
tional algorithm  are  fully  explained  and  storage  schemes  are  developed.   All 
nonlinear  programming  problems  to  be  solved  by  the  Combined  Method  are  divided 
into  two  groups:   (a)  the  problems  with  a  large  number  of  constraints  which 
are  linear  to  a  great  extent,  and  (b)  the  smaller  problems  with  a  more  general 
form  of  constraints.   Storage  schemes  are  developed  for  each  group  to  handle 
particular  characteristics  as  explained  below. 

ILLIAC  IV  is  a  large  scale  array  computer  which  is  divided  into  four 
quadrants  with  6h   processing  elements  (PE)  and  6k   processing  element  memories 
(PEM)  per  quadrant.   The  Combined  Method  is  designed  to  fit  into  one  quadrant 
but  can  be  expanded  at  a  later  date.  A  quadrant  has  one  control  unit  (CU)  which 
broadcasts  simultaneous  instructions  to  the  PEs.   Individual  PEs  operate  only  if 
the  mode  is  correctly  set.   The  idea  is  to  keep  a  maximum  number  of  PEs  operating 
to  enhance  the  efficiency  of  the  machine.  All  computer  installations  are  con- 
cerned with  efficiency  and  ILLIAC  IV  is  no  exception.   This  is  emphasized  by  the 
fact  that  6*+  PEs  must  wait  for  an  input-output  request.   Parallel  calculations 
improve  this  efficiency. 

Any  algorithm  can  be  coded  in  parallel  to  some  degree;  however,  certain 
techniques  are  better  than  others.  Much  of  the  time,  the  Projected  Gradient 
Method  builds  on  previous  steps.   The  Sequential  Unconstrained  Method  also  uses 
a  great  deal  of  previous  information  during  the  sequence.   Portions  of  the  method, 
such  as  the  locking  test,  are  not  so  satisfactory  for  parallel  computations. 
Nonetheless,  nost  of  the  algorithm  can  be  successfully  implemented. 
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Storage  Methods 

The  division  of  the  constraints  into  two  groups  has  been  discussed 
Ler  in  the  paper.   This  division  is  made  to  better  utilize  the  parallel 
capabilities  of  the  machine.   Tt  is  possible  to  divide  the  types  of  problems 
according  to  many  criteria  other  than  that  used  in  the  Combined  Method.   Perhaps 
the  user  wishes  to  refrain  from  approximating  the  constraints  in  a  large  problem. 
He  is  faced  with  the  difficulty  of  storing  the  code  on  the  disk  and  waiting  for 
retrieval  of  the  information.   Depending  upon  the  particular  type  of  problem, 
he  must  devise  some  method  which  improves  efficiency.   These  kinds  of  problems 
are  not  examined  here  because  of  the  enormous  variety  of  the  forms  of  large 
general  constraints.   Also  these  problems  are  not  common  because  of  the  effort 
required  in  setting  them  up. 

The  author  does  not  discount  all  large  size  nonlinear  problems;  how- 
ever, these  are  limited  to  those  which  can  be  approximated  by  separable  poly- 
nomial equations.   These  have  the  form 

r 

g,  (Xn  )  -  bn  =  Z  (a  .x .  +  .  .  .  +  a,  .x .  +  a  . ) 

&lv  1J  1    .   v  nj  j  lj  j    oj' 

where  n  is  the  highest  order  of  the  polynomials.   The  objective  function  can 
possess  any  reasonable  form,  but  the  constraints  must  have  the  above  formulation. 
Linear  programming  constraints  have  this  form  and  are  clearly  a  subset  of  the 
separable  polynomials.   Large  size  constraints  which  do  not  possess  this  shape 
are  approximated  by  n-th  order  polynomials  before  the  procedure  is  run.   The 
storage  of  the  separable  polynomials  is  a  variation  of  the  "strewing"  method 
devised  for  the  ILLIAC  IV  LP  algorithm  with  the  columns  and  rows  interchanged. 
The  method  assigns  coefficients  of  the  columns  of  the  constraints  depending 
upon  the  number  of  elements.   The  most  dense  column  enters  PEn,  the  next  dense 
into  PE  ,  and  proceeds  to  PE^-  where  the  order  of  assignment  is  reversed.   The 
densest  remaining  column  enters  PE/-~  and  the  process  returns  to  PE~.   The 
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overlapping  of  columns  is  continued  'mull  no  coefficients  remain.  Next,  the 
PEs  are  sorted  according  to  rows.   Each  coefficient  is  stored  in  32  hit  half- 
words  with  the  power  and  the  column  and  row  tags  in  the  remaining  portion  of  the 
word.   This  storage  scheme  has  many  advantages:   (l)  the  rows  can  be  calculated 
in  parallel  and  the  columns  are  accessible  if  needed;  (2)  the  number  of  elements 
per  PE  is  approximately  equal;  (3)  only  the  nonzero  elements  are  stored,  which 
saves  space  because  of  the  sparseness  of  large  problems.   The  general  objective 
function  is  stored  in  a  manner  sijnilar  to  the  method  below. 

The  general  constraints  are  stored  in  a  different  fashion  than  the 
separable  polynomials.   For  example,  the  constraint 


In  xr 


-1   """**  "2        -1 
g1(Xk)  =  sin  (x±  x^  +  esc  x^)  , 


must  be  stored  in  some  fashion.   It  is  translated  into  code  which  is  entered  in 
a  straight  storage  scheme. 
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instl' 
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inst3  inst^ 
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X13     \Xlk 
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X22 

X23     \X2h 

PEM  63 


inst     '  inst 
127    |     128 

Xl,127j  Xl,l28 
X2,127jX2,128 

Figure  9*   A  straight  storage  scheme. 
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opt.  To  realize  the  parallel  cap-  ,  'he 
algorithm  calculates  many  simultaneous  vectors.  Tli  liagram  Lustrates 
this  fact.   Thr  data  now  is  the  generated  matrix  of  ti  -h 

Ls  aiso  illustrated  in  Figure  9.   The  first  subscript  represents  the  position 

Mi  in  a  solution,  and  the  second  subscript  designates  the  current  solution; 
for  example,  X   is  the  first  position  within  the  k-th  current  solution.   In 

it  cases  6k   solutions  are  computed  together,  which  helps  improve  the  global- 
local  difficulties. 

These  two  diverse  storage  schemes  represent  two  trains  of  thought. 
The  first  idea  is  that  large  problems  are  so  unruly  that  any  reasonable  answer 
Is  usually  sufficient;  hence  the  parallel  capabilities  are  used  to  achieve  this 
solution.   The  algorithm  gives  better  solutions  by  reiteration  as  often  as  the 
user  wishes.   Problems  with  a  smaller  number  of  general  constraints  are  handled 

a  different  philosophy.   The  parallel  capability  is  used  to  predict  the  global 
optimum  by  calculating  a  multitude  of  local  optima  simultaneously  and  its  advan- 
tage lies  in  this  concept. 

5.3  Disk  Delays 

Although  the  ILLIAC  IV  is  designed  with  an  extremely  fast  disk  storage, 
it  is  slow  relative  to  the  speed  of  computations  of  the  machine.   The  delays  for 
input-output  of  information  must  be  minimized,  of  course,  but  the  latency  time 
also  must  be  utilized.   In  other  words,  the  time  when  the  computer  is  requesting 
information  must  be  used  for  other  calculations.   The  locking  test  is  ideally 
suited  for  this  delay  purpose  because  it  is  performed  between  iterations.   The 
locking  test  determines  the  subsequent  method  to  be  used  while  the  data  is  being, 
read  into  memory.   This  procedure  improves  efficiency  because  the  locking  test 
is  a  sequential  operation  for  the  most  part  and  it  would  slow  the  algorithm  if 
used  elsewhere. 
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The  Combined  Method  has  relatively  few  iterations  because  it  attempts 
to  maximize  improvement  of  the  objective  function  at  each  step.   In  this  respect, 
it  is  well  suited  to  an  array  computer  because  it  minimizes  the  difficulty  men- 
tioned heretofore.   Problems  with  few  iterations,  even  though  the  computations 
per  iteration  might  be  complicated,  are  much  more  appropriate  to  the  ILLIAC  IV. 
The  Combined  Method  fulfills  these  requirements. 

5«*+  Computational  Procedure 

The  algorithm  has  been  explained  above,  but  some  of  the  more  elementary 
calculations  have  not  been  discussed.   This  section  attempts  to  alleviate  this 
omission. 

The  heart  of  the  Unconstrained  Technique  is  the  incorporation  of  the 
constraints  into  the  objective  function  which  results  in  an  unconstrained  prob- 
lem.  This  conversion  is  a  likely  target  for  parallel  calculations,  and  the  in- 
verse penalty  function  is  nice  in  this  respect.   Recall  the  form  of  the  inverse 
revised  objective  function 


m 
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To  determine  the  revised  objective  function,  the  algorithm  computes  the  constrain 
vector  simultaneously  across  processing  elements  for  separable  constraints, 
broadcasts  p  to  all  elements,  multiples,  and  finds  the  sum  within  each  PE  and 
then  globally  across  the  PEs.   The  conversion  takes  place  frequently  and  is 
therefore  important.   For  general  constraints  multiple  solutions  are  determined. 

The  computation  of  the  logarithmic  penalty  function  also  is  an  easy 
task  to  perform  in  parallel.   The  familiar  relationship 

in  x  +  £n   y  =  £n   xy 
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used  to  advantage.   Substitution  f  the  vector  X  Lnto  the  constraints  is 

milarl.v  bo  the  Inverse  penalty  function.   The  operands  are  multiplied, 
i  the  natural  logarithm  is  taken. 

A  most  important  portion  of  the  Combined  Method  is  the  computation  of 
Lient.   It  is  the  central  idea  in  the  Projected.  Gradient  Method  and  is 
it  useful  and  necessary  in  the  first  portion  of  the  Combined  Method  in  mini- 
an  unconstrained  function.   The  gradient  is  defined  by  the  equation 

Vf(X)  =  (df/tfx^  .  .  .  ,  df/dx  ) 

and  the  partial  derivatives  are  usually  found  through  the  approximation 

df/o^  =  (f(x^   +  D)  -f^  -  D))/2d 

where  D  =  (d,  0,  .  .  .  ,  0) 
which  is  called  the  central  difference  equation.   It  has  better  accuracy  than 
the  forward  or  backward  difference  equations.   Of  course  exact  derivatives  are 
used  when  they  are  easily  calculated,  as  in  polynomials.   Here  the  derivatives 
are  fed  as  input  into  the  system.   The  direction  of  the  gradient  is  normalized 
by  the  equation 


=  [(df/dXl)  ,."..,  (df/dxn)]/a  , 

l2~ 


a  = 


n 
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The  Sequential  and  Projected  Methods  have  separate  means  of  termination 
at  a  local  stationary  point.   Necessary  and  sufficient  conditions  show  that  both 
of  these  methods  usually  arrive  at  an  optimum  in  the  limit.   The  Sequential 
Method  is  especially  weak  when  the  final  solution  occurs  at  a  boundary.   Hence 
the  projected  gradient  test  for  an  optimum  is  used  instead.   This  test  takes  the 
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form 
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and 


R..  <  0        i  =  1.  .  .  .  ,  s 


where  5  is  the  tolerance  presented  heretofore.   When  these  conditions  are  ful- 
filled, the  current  solution  is  no  more  than  a  distance  8  from  the  local  opti- 
mum.  The  first  equation  is  computed  sequentially  in  the  locking  test  and  com- 
pared against  the  constant.   Time  is  lost  in  many  of  the  elements  but  not  a 
great  deal  because  of  the  shortness  of  this  computation.   The  Rs  can  be  computed 
in  parallel,  on  the  other  hand.   The  rows  and  columns  of  the  matrices  can  be 
operated  on  together  for  calculation  of  the  R-vector.   One  test  is  made  to  deter- 
mine if  any  are  negative. 

5. 5  Size  Limitations 

The  number  and  magnitude  of  the  constraints  determines  the  size  of  the 
nonlinear  problem.   The  system  can  be  quite  large  for  the  quasi-linear  con- 
straints.  In  fact  nonlinear  programming  techniques  might  be  used  successfully 
in  linear  programming.   The  storage  would  be  the  same  as  in  the  previous  section 
for  separable  constraints.   At  the  present  time  it  is  difficult  to  determine  the 
feasibility  of  this  proposal.   The  gradient  method  cuts  across  the  feasible  re- 
gion, in  contrast  to  the  simplex  method  which  follows  the  boundaries  at  basic 
feasible  solutions.   The  advantage-  of  the  gradient  method  will  be  determined  when 
a  comparison  of  the  number  of  iterations  is  developed. 


us  on  ,      lonlinear 
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t\  uit.-i  i  i        mall  nonlinear  problems  is  the  con- 

:ept  of  "multiple  search".   he  process  is  described      i    mpl  one-dimen- 
Lrch  below.   The  curve  of  Figure  10  has  many  peaks.   How  does  as 
id  the  global  optimum  for  this  function?  For  an  array  computer,  a  multiple 

jorithm  Ls     ested.   Although  the  example  is  elementary,  the  uses  are 


Figure  10.   Multimodal  functions. 


many;  for  example,  the  search  for  the  optimum  step  length  along  the  negative 
gradient  direction  is  a  one -dimensional  search.   The  procedure  first  determines 
the  boundaries  of  the  zone  of  uncertainty.   The  zone  is  divided  into  6k   smaller 
intervals  and  the  values  of  these  points  are  found  in  parallel.   A  sweep  of  the 
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6U  functional  values  decides  upon  the  number  of  peaks  in  the  zone.   In  the 
example  three  valleys  are  found.   The  process  now  switches  to  the  sequential 
mode  in  which  each  of  the  three  valleys  are  closed  upon  together.   If  the  inter- 
val had  been  larger,  the  simultaneous  search  would  have  been  performed  a  few 
more  iterations.   By  this  method,  many  local  minima  are  found.  When  the  zone 
of  uncertainty  has  been  closed  sufficiently  the  optima  are  compared  for  the  best 
value.   The  probability  of  finding  the  global  optimum  is  greatly  increased. 
This  same  idea  can  be  applied  to  more  complicated  searches  such  as  the  Sequen- 
tial Unconstrained  Minimization  Technique.   Parallel  computations  make  this  task 
worthwhile  and  profitable. 
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CON  LUSION 

This  paper  has  examined  various  algorithms  to  d        bhe  most 
Le  to  utilize  on  the  array  computer  ILLIAC  IV.   Marry  of  these  methods, 
s  the  Cutting  Plane  Technique,  were  rejected  because  of  the  lack  of 
at ions  for  nonconvex  problems.   Other  methods  are  rejected  because  of 
the  slowness  of  convergence  for  general  problems.   The  best  methods  are  the 
Sequential  Unconstrained  Minimization  Technique  and  the  Projected  Gradient 
Method.   These  methods  work  in  nonconvex  regions,  have  good  convergence  pro- 
perties, and  augment  each  other  quite  effectively.   For  these  reasons,  the  two 
algorithms  are  combined  into  a  single  system  for  solving  nonlinear  problems. 

Detailed  questions  about  the  Sequential  Unconstrained  Technique  were 
examined  and  answered.   The  Projected  Gradient  Method  was  presented  in  depth,  and 
the  difficulties  examined.   The  actual  implementation  of  the  Combined  Method  was 
examined,  and  comments  injected  where  applicable.   The  parallel  capabilities  of 
portions  of  the  method  were  brought  out  as  were  questions  concerning  data  mani- 
pulation. 

Many  results  have  been  discovered  through  the  study.   Some  of  the  more 
useful  are  presented  below: 

1.  Nonlinear  programming  theory  is  a  hodgepodge  of  particular  examples; 
hence,  one  method  is  not  acceptable  in  a  general  nonlinear  system. 

2.  The  Sequential  Unconstrained  Method  is  the  most  straightforward  of 
all  existing  theories.   It  also  has  many  computational  advantages. 

3.  Although  the  Projected  Gradient  Method  is  quite  involved,  the 
extra  calculations  often  accelerate  convergence. 

k.      A  combination  of  these  two  methods  gives  wide  latitude  in  the 
range  of  nonlinear  problems  which  can  be  solved. 

5-   A  test,  which  the  author  calls  the  locking  test,  must  be  developed 
for  controlling  the  method  being  used. 
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6.   The  Combined  Method  appears  to  be  the  best  system  for  implementa- 
tion on  an  array  computer. 

Any  system  for  solving  nonlinear  programming  problems  is  very  incom- 
plete at  best.   Hence  many  limitations  are  present  when  the  design  is  finished. 
To  compensate  for  the  unexpected  deficiencies  one  must  allow  for  an  evolutionary 
process  to  take  place.   The  system  must  change  as  particular  difficulties  or  un- 
expected types  of  applications  occur.   The  Combined  Method  leaves  room  for 
future  changes  in  many  places.   For  example,  the  user  can  supply  an  additional 
technique,  such  as  the  Cutting  Plane  Method,  if  certain  portions  of  the  feasible 
region  display  linear  characteristics.   This  kind  of  detail  must  be  developed 
after  the  design  has  been  implemented,  that  is,  if  the  design  of  the  algorithm 
makes  provision  for  these  additions. 

The  author  believes  that  the  problems  with  separable  constraints  will 
be  the  first  application  of  the  Combined  Method.   Linear  programming  problems, 
which  are  a  subset  of  the  class  of  separable  polynomials,  might  be  better  solved 
with  a  nonlinear  technique  for  very  large  size  problems;  quasi-linear  problems 
are  included  in  this  framework. 

Problems  with  general  constraints  are  the  least  defined  and  will  there- 
fore be  the  last  application  of  the  Combined  Method  to  be  fully  realized.  None- 
theless much  progress  can  be  made  with  these  applications  by  minimizing  the 
global-local  difficulties.   Effort  should  be  initiated  in  this  area. 
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APPENDIX  A 

CALCULATIONS  FOR  THE  EXAMPLE  PROBLEM 
OF  THE  PROJECTED  GRADIENT  METHOD 


Minimize 


subject  to 


Note  that 


Setting 


"3X1  " 


§1(X)  =  -x2±/k  +  2x±   -   x2  >  0 


g2(x)  =  Xlx2  -  h  >  o 


g3(X)  =  xx  >  0 


gl+(X)  =  x2  >  0  . 


Vfk  =  [-3,-1] 


Vgl(X)  =  [-Xl/2  +  2,-1] 
Vg2(X)  =  [x2,Xl] 


Xx  =  [2,3] 

g1(X1)  -  0   ,    g2(X1)  =  2  >  0 

ux  =  [vg1(x1)] 

=  [1,-1] 


U1U1  "  2 
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vl =  ^iv"1  =  -5 


Rl  ■  w*!  ■  -1 


U^  =  [-1,1] 


?i^i  =  ^i  -  UA  =  ["2'-2] 


Move  in  the  direction  of  projected  gradient  P-,^-,  •   Let  d  =  2.0  .  Now 


N 


l  =  ^l/H  ^l  II  =  [--707, -.707] 


Y  =  X  +  d-^  =  [0.586,1.586] 
W(YX)  =  g1(Y1)  =  [-0.5] 

correction  =  YA^n-1^  =  ^--25,  -25D 

Y  =  Y±   -  correction  =  [0.836,1.336] 


W(Y2)  =  [0.161] 


correction-  =  UVWn (Y  . )  =  [.756.I.U16] 

2      lv  m-1        ' 


W(Y  )  =  -O.0U7 


and  continue  until 


gl(x)  <  .005  . 


The  solution  thus  obtained  is  not  very  good  due  to  the  arbitrariness  with  which 
the  value  d  =  2  was  selected. 


Adding  a  constraint 


if  X1  =  [2,3]    ,   then  g2(X)    =  2  >  0 


if  X2  =  correction     g  (X)  =  -2.93 


By  interpolation 


d  =  j£j£  X  2.0  =  .81 


Y±   =   X  +  d-Nx  =  [1.1+28,2.428] 

W(Y1)  =  [-0.081] 

correction  =  UW(Y   )  =  [-.o4o,.o4o] 

Y2  =  Yl  "  correcti°n  =  [1.468,2.388] 

W1(Y2)  =  (.014) 

W2(Y2)  =  (-.49) 
"by  continuing  the  above  process  one  finds 

x2  =  [1.578,2.534] 
and 


gl(x2)  =  g2(x2)  =  0. 


6k 


Now  check  for  the  optimum  at  Xp.   In  this  case  U.  is  a  2  X  2  nonsingular  matrix, 
in  which  case 


since 


P  -  I  -  V^VS  "  °  ' 


Hence 


(U2U2rl  'UjVg)-1 


P2^f2  II  -  0 


R21<0 


R22  <  o  , 


therefore  X  is  a  local  optimum. 
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APPENDIX  B 

FLOW  DIAGRAMS  OF  THE 
COMBINED  METHOD 
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Cterap  =  k  ^ 
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